6 Common Probability Inequalities

1 Boole's Inequality

By disjoint additivity we can compute P(A1A2)=P(A1)+P(A2)P(A1A2).
Actually we have the general Inclusion-Exclusion Principle: (1.1)P(i=1nAi)=Σ1Σ2+Σ3Σ4++(1)n1Σn, where Σk=1i1<<iknP(Ai1Aik), which can be proved by induction and definition. However we can have a neater result when it comes to inequality:

Theorem (Union Bound, Boole's Inequality)

A1,,AnF are a collection of events on a probability space (Ω,F,P). Then P(i=1nAi)i=1nP(Ai).

This can be proved by induction on n and (1.1), so is omitted.

2 Cauchy-Schwarz Inequality

Theorem (Cauchy-Schwarz Inequality)

Let X,Y be two RVs on the same probability space. Then (E[XY])2E[X2]E[Y2], i.e. (Cov(X,Y))2Var(X)Var(Y).

3 Concentration Inequalities

We often want to estimate the tail, i.e. P(Xc) or P(|Xμ|c). This is important for proving convergence results, bounding failure probabilities, or probabilistic bounds on runtimes.

3.1 Markov's Inequality

Theorem (Markov's Inequality)

Let X be a non-negative RV with E[|X|]<. Then for any constant c>0, P(Xc)E[X]c.

Theorem (Generalized Markov's Inequality)

Let X:ΩR be an arbitrary RV. Then for all constants c>0,k>0, P(|X|c)E[|X|k]ck.

3.2 Chebyshev's Inequality

Theorem (Chebyshev's Inequality)

For all RVs X with E[X]=μ< and for all constants c>0, P(|Xμ|c)Var(X)c2.

3.3 Chernoff Inequalities

Theorem (Chernoff Inequalities)

A one-parameter family of bounds derived as follows:

  1. t>0,cR, P(Xc)mint>0MX(t)etc.
  2. t<0,cR, P(Xc)mint<0MX(t)etc.

3.4 Hoeffding's Inequality

This inequality is specific for bounded RVs.

Theorem (Hoeffding's Inequality)

Let X1,,Xn be independent RVs with E[Xi]=μi<, and P(aiXibi)=1 for some constants ai,biR. Let Sn=X1++Xn, then ε>0, P(|SnE[Sn]|ε)2exp[2ε2i=1n(biai)2].
Equivalently, P(|SnnE[Sn]n|ε)2exp[2ε2n2i=1n(biai)2].

Hoeffding's bound only depends on the range of Xi, not its distribution over [ai,bi]. Using more information about Xi can lead to a sharper bound, like Chernoff.

Given the above two lemmas, we can prove Hoeffding's inequality.
First, since E[expt(Sni=1nμi)]=E[i=1net(Xiμi)]=i=1nE[et(Xiμi)]i=1net22(biai2)2=et22i=1n(biai2)2,
("" by lemma 2) so Sn is sub-Gaussian with variance proxy σ2=i=1n(biai)24.
Next, let X=Sn and apply lemma 1, we complete the proof.